Picture for Dongming Wu

Dongming Wu

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Add code
Jun 01, 2026
Viaarxiv icon

From Web to Pixels: Bringing Agentic Search into Visual Perception

Add code
May 12, 2026
Viaarxiv icon

AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild

Add code
Feb 10, 2026
Viaarxiv icon

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Add code
Oct 10, 2025
Viaarxiv icon

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping

Add code
Jul 31, 2025
Figure 1 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 2 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 3 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Figure 4 for RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Viaarxiv icon

Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding

Add code
Jun 05, 2025
Figure 1 for Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Figure 2 for Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Figure 3 for Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Figure 4 for Grounding Beyond Detection: Enhancing Contextual Understanding in Embodied 3D Grounding
Viaarxiv icon

Cognitive Disentanglement for Referring Multi-Object Tracking

Add code
Mar 14, 2025
Figure 1 for Cognitive Disentanglement for Referring Multi-Object Tracking
Figure 2 for Cognitive Disentanglement for Referring Multi-Object Tracking
Figure 3 for Cognitive Disentanglement for Referring Multi-Object Tracking
Figure 4 for Cognitive Disentanglement for Referring Multi-Object Tracking
Viaarxiv icon

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation

Add code
Nov 18, 2024
Figure 1 for DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Figure 2 for DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Figure 3 for DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Figure 4 for DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
Viaarxiv icon

Bootstrapping Referring Multi-Object Tracking

Add code
Jun 07, 2024
Figure 1 for Bootstrapping Referring Multi-Object Tracking
Figure 2 for Bootstrapping Referring Multi-Object Tracking
Figure 3 for Bootstrapping Referring Multi-Object Tracking
Figure 4 for Bootstrapping Referring Multi-Object Tracking
Viaarxiv icon

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

Add code
May 28, 2024
Viaarxiv icon